Skip to the content.

Introduction

The representation of women in media has long been a topic of interest, as it reflects societal norms and attitudes towards gender equality. Despite the progress made in recent decades towards gender equality, it is important to examine whether these changes are reflected in the films we watch. Movies provide a unique insight into the subconscious ways in which society is conditioned to view women, and can capture the ideals and norms of the time in which they were produced.

In this data analysis project, we will use the CMU Movie Summary Corpus dataset, as well as additional datasets from Stanford CoreNLP, IMDb, Wikidata, IMDB, and Box Office Mojo, to explore the portrayal of women in film. This includes examining the roles of actresses, characters, and writers and directors. By analyzing these factors, we aim to gain a deeper understanding of how women have been depicted in media over time and how this representation may have evolved. Our analysis will also allow us to consider the ways in which society views and treats women and the progress made towards gender equality.

The Data

Our analysis is based on merging the CMU Dataset, the CMU Dataset, the Stanford CoreNLP-processed summaries, IMDb, Wikidata, IMDB and Box office Mojo. We have separated the data into three tables: the movies table, the characters table, and the directors and writers table.

The Impact Score metric

Movies

We have created a metric in order to measure the impact of a movie on the average rating and the number of votes. Our assumption is that an impactful movie has a lot of votes and has either an extremely good or bad average rating.

We apply a logarithmic transformation to the number of votes in order to normalize the data and accurately compare the impact of different movies. We then take the absolute value of the normalized average rating for each movie. This accounts for both very good and very bad movies, as both have a significant impact on audience reception. By combining these two factors, we are able to calculate the overall impact a movie has on its audience and compare this across different films.

\[\textrm{Impact Score}_\textrm{Movies} = \textrm{normalized} (\log(\textrm{number of votes})) \cdot \textrm{abs}(\textrm{normalized}(\textrm{IMDB rating}))\]

According to this metric, those are the top 10 most impactful movies of our dataset:

title average rating number of votes impact score
The Shawshank Redemption 9.3 2648879 9.90
The Dark Knight 9.0 2620838 8.91
Inception 8.8 2322848 8.15
Fight Club 8.8 2093849 8.05
Forrest Gump 8.8 2051278 8.03
Pulp Fiction 8.9 2027513 8.33
The Matrix 8.7 1894094 7.64
The Lord of the Rings: The Fellowship of the Ring 8.8 1851387 7.93
The Godfather 9.2 1836155 9.16
The Lord of the Rings: The Return of the King 9.0 1824685 8.53

Actors, writers and directors

For actors, writers, and directors, we use the Discounted Cumulative Gain to rank the movies they are linked to according to the impact score and compute their overall impact.

\[\textrm{Impact Score}_\textrm{Actors, Directors, Writers} = \sum_{i=1}^{\textrm{number of movies}}\frac{\textrm{movie metric}_i}{\log_2(i + 1)}\]

Here are the top 10 actors, writers and directors with the highest impact score:

actors directors writers
name impact score name impact score name impact score
Samuel L. Jackson 47.28 Steven Spielberg 35.52 Stephen King 35.70
Robert De Niro 45.92 Martin Scorsese 34.01 George Lucas 29.18
Michael Caine 42.68 Alfred Hitchcock 30.92 Christopher Nolan 29.14
Morgan Freeman 42.38 Christopher Nolan 29.12 Bob Kane 28.51
Al Pacino 39.39 Francis Ford Coppola 27.79 Quentin Tarantino 27.30
Bruce Willis 38.88 Quentin Tarantino 26.34 Francis Ford Coppola 26.90
Gary Oldman 37.17 Akira Kurosawa 24.82 Akira Kurosawa 26.66
Robert Duvall 36.77 Stanley Kubrick 24.71 David S. Goyer 25.18
Tom Hanks 36.71 Clint Eastwood 23.37 Billy Wilder 24.22
Brad Pitt 36.55 Uwe Boll 22.24 Hayao Miyazaki 24.02

Where are the Women?

As our project focuses on the representation of women in movies, it can be interesting to look at the percentage of female actresses per genre per decades.

When it comes to genre, women are most often represented in dramas, comedies and romances, while they are underrepresented in action adventure and sci-fi films.

When considering the representation of women among directors and writers, we found that the pattern is similar, although the overall percentage of women in these roles is significantly lower than for actresses.

Behind the Camera

Put the director analysis here. Who are the most impactful directors for the most common genres? What are their best movies? What are they about? how does the impact of female directors compare with male directors?

In Front of the Camera

Put the Actor analysis here

On the Screen

Put the character analysis here

Women of Impact

Despite the challenges facing women in the film industry, there are many women who have made a significant impact and achieved great success in their roles.

These women have not only excelled in their careers, but have also challenged stereotypes and paved the way for future generations of women in media.

That’s a Wrap!

In conclusion, the representation of women in media is limited and often stereotypical. However, there are many talented and successful women in the industry who are making a significant impact. It is important for the industry to continue to strive for greater diversity and representation, in order to create a more accurate and fair portrayal of women in media.